March 12, 2003 



2.6 Bayes estimation. The definition of Bayes estimator is a special case of the general 
definition of Bayes decision rule given in Sec. 1.3. Given a family {Pg, 9 G 0} of laws, 
where (6, T) is a measurable space, a loss function L(6>, y), the risk for an estimator U at 9 
defined by r(6>, U) := EgL{9, U), and a prior tv defined on (0, T), an estimator T is Bayes 
for 7r iff the Bayes risk r(n, U) := J r(9, U)dn(9) has its minimum for all statistics U when 
U = T. Recall that by Theorem 1.3.8, if a decision problem for a measurable family and 
a given prior has a decision rule with finite risk and some decision rule a(-) minimizes the 
posterior risk for almost all x, then it is Bayes. Recall also that if a family {Pg, 9 G 0} 
is dominated by a a-finite measure v, we can choose v equivalent to the family by Lemma 
2.1.6. For squared-error loss, Bayes estimates are just expectations for the posterior: 

2.6.1 Theorem. Let {Pg, 9 G 0} be a measurable family equivalent to a a-finite measure 
v. Let 7T be a prior on and g a measurable function from into some M. d . Then for 
squared-error loss, there exists a Bayes estimator for g{9) if and only if there exists an 
estimator U for g{9) with finite risk, 



'Or, E0 = J J\U(x)-g(9)\ 2 dPg(x)dn(9) 



< oo. 



Then a Bayes estimator is given by T(x) := J g{9)dn x {9) where the integral with respect 
to the posterior 7r x exists and is finite for w-almost all x. T is the unique Bayes estimator 
up to equality ^-almost everywhere. Thus T is an admissible estimator of g. 

Proof. Since | • | 2 is the sum of squares of coordinates, we can assume d = 1. By 
Propositions 1.3.5 and 1.3.13, the posterior distributions -k x have the properties of regular 
conditional probabilities of 9 given x as defined in RAP, Section 10.2. 

"Only if" holds since by definition, a Bayes estimator has finite risk. To prove "if," 
let U have finite risk, r(n,U) < oo. Let dQ(9,x) := dPg(x)dn(9) be the usual joint 
distribution of 9 and x. Then the function (9,x) h- > U(x) — g{9) is in C 2 (Q), even though 
possibly neither x i— > U(x) nor 9 h- > g(9) is. Thus U(x) — g{9) G and we have the 

conditional expectation (by RAP, Theorem 10.2.5) 

E(U(x)-g(9)\x) = J U(x) - g(9)dn x (9) = U(x) - J g(9)dn x (9) 

for w-almost all x, since U(x) doesn't depend on 9. Thus T(x) is well-defined for f-almost all 
x. Now x i — ^ U(x)—T(x) is the orthogonal projection in C 2 (Q) of U(x)— g(9) into the space 
H of square-integrable functions of x for Q (RAP, Theorem 10.2.9), which is unique up to 
a.s. equality (RAP, Theorem 5.3.8). Thus f(U(x) - g(9) - f(x)) 2 dQ(9,x) is minimized 
over all square-integrable functions f of x when and only when f(x) = U(x) — T(x) for 
w-almost all x. For any other estimator V(x) of g(9) with finite risk, U — V G H . Thus 
J (V(x) — g(9)) 2 dQ(9,x) is minimized among all estimators V(x) of g(9) when V = T, in 
other words, T is a Bayes estimator of g(9), unique up to ^-almost everywhere equality. 

A Bayes estimator or decision rule, unique up to almost sure equality for all Pg, is 
always admissible by Theorem 1.2.5. □ 
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Example. Let = R and consider the normal location family with Pg = N(9, 1), 9 EM.. 
Let 7r be the Cauchy prior, dn(9) = dO/[ic(l + 9 2 )}, 9 el. Let g(9) := 9 and U(x) := x. 
Then for squared-error loss, r(9, U) = 1 for all 9, so r(n, U) = 1 for any prior, specifically 
the Cauchy prior. Since this risk is finite, Theorem 2.6.1 tells us that a Bayes estimator 
exists. Note however that g is not in C 2 or even in C 1 of the prior. We know that 
conditional expectations given x, when they exist, are given by integrals with respect to 
posterior distributions n x . In this case, however, where the expectation is undefined, so 
are conditional expectations, but the integrals with respect to n x are defined and give the 
Bayes estimator. It is easily seen that multiplication by the normal likelihood function 
makes the integrals finite. 

Given a parameter space with a metric d defined on it, where (0, d) is separable 
and T is the Borel a-algebra, and given a loss function, a decision rule T will be called 
Bayes admissible if there exists some prior n, with tt(U) > for every non-empty open set 
f/c6, such that T is Bayes for it. 

Consider again the estimator of p 2 from two binomial trials given near the end of Sec. 
2.5, which is unbiased and admissible for many loss functions, but gives the estimate for 
p 2 when one success is observed in the two trials. This estimator is not Bayes admissible. 
Moreover, it is not Bayes for any prior rr such that 7r((0, 1)) > for the open interval (0, 1). 

Other, familiar estimators are not Bayes admissible. For example, to estimate p given 
that there were k successes in n independent trials with probability p of success on each, 
the classical estimator of p is k/n, which is unbiased, sufficient and LS. It estimates that 
p = when k = and p = 1 when k = n. These estimates make it not Bayes admissible. 
A Bayes estimator for any prior giving positive probability to the open interval (0, 1), for 
squared-error loss, must make a strictly positive estimate even when k = and an estimate 
< 1 even when k = n. For the prior uniformly distributed over [0, 1], the Bayes estimator 
is (k + l)/(n + 2) which, of course, is not unbiased (e.g. for p = 0, 1), but on the whole 
the possible bias of Bayes or Bayes admissible estimators seems a lesser fault than those 
of unbiased estimators in examples such as we have just recalled. 

Surprisingly, Bayes admissibility for squared-error loss is quite incompatible with un- 
biasedness: 

2.6.2 Theorem. For any dominated, measurable family {Pg, 9 E 0}, prior n on and 
measurable real-valued function g on 0, an unbiased estimator T of g is Bayes for it and 
squared-error loss if and only if it has risk r(7r,T) = 0, so that T(x) = g(9) P^-almost 
surely for 7r-almost all 9. 

Remarks. The condition of Bayes risk is extremely restrictive: note that whenever 
g{9) 7^ g(4>), the laws Pg and P^ must be singular (the opposite extreme from an equivalent 
family). So, for any equivalent family, a 1-1 function g has no unbiased, Bayes estimator 
for any prior which is not trivial (concentrated at one point). 

Proof. "If" is clear. To prove "only if," by definition of Bayes decision rule (Section 
1.2), T must have finite risk. Thus by Theorem 2.6.1, T{x) = J g(9)dn x (9). Let p be 
the "predictive" distribution for x, in other words its marginal distribution under Q, as 
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defined just before (1.3.6). Then dQ(9,x) = dn x (9)dp{x). We have 

r(7r,T) = J jT(x) 2 -2T(x)g(9)+g(9) 2 dn x (9)dp(x) 

= j T(x) 2 -2T(xf + J g(9) 2 dn x (9) dp(x). 
Since r(n,T) < oo, we have J g(9) 2 dn x (9) < oo for p-almost all x, and 

r(7r,T) = J g{9) 2 -T{xfdQ{9,x). 

Similarly, by unbiasedness, 

r(7r,T) = J jT(x) 2 -2T(x)g(6)+g(6) 2 dP 9 (x)diT(6) 



T(x) 2 dP e (x)-2g(9) 2 +g(9)' 



so r(n,T) < oo implies / T(x) 2 dPe(x) < oo for 7r-almost all 9, and 



r(7r,T) = jT(x) 2 -g(9) 2 dQ(9,x) = -r(yr,T), 



so r(7r,T) = 0, finishing the proof. 



□ 



PROBLEMS 

1. Let T(a) := / °° x a ~ x e~ x dx for a > 0, and f\, a (x) := X a x a - 1 e~ Xx /T(a) for x > 0, 
for x < 0, where a > and A > 0. Then f\ t a is a gamma probability density with 
scale parameter X and shape parameter a. Let the parameter A of a Poisson distribution 
with P(X = k) = X k e~ x /k\ for k = 0, 1, have a prior distribution e~ x dX for A > 
(standard exponential distribution). Let k be the observed value of the Poisson random 
variable. What is the posterior distribution of A? 

2. For the binomial distribution with n trials and probability p of success, the variance is 
np(l —p). More specifically, suppose we observe n i.i.d. Bernoulli variables X\, ...,X n 
where P(Xj = 1) = p = 1 — P(Xj = 0) for each j. Let k := Y^j=i Xj be the number 
of successes. 

(a) Show that the usual unbiased estimator s 2 of variance (for general distributions) 
equals k — k 2 jn in this case. 

(b) Show that s 2 is admissible in this case for any convex loss function. Hint: see the 
proof of Proposition 2.5.18. 

(c) Show that s 2 is not Bayes admissible for squared-error loss in this case. Hint: It 
is unbiased. 
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3. "Unbiased estimation on a circle." Let the sample space be the unit circle S := 
{(x,y) : x 2 + y 2 = 1}. Given n points on the circle all within half the circle (some 
arc of 7r radians), assign angular coordinates in an interval of length n radians. Any 
real number 9, viewed as an angle, defines a point sg := (cos 9, sin 9) G S 1 . For 
example, if the observations were (1, -l)/2 1 / 2 and (1, l)/2 1 / 2 a suitable interval would 
be — 7r/2 < (p < tv/2 but not < < it. Average these coordinates to get a 9 and then 
an estimator T := s-g G S 1 . 

(a) Show that T is a well-defined statistic. 

(b) For any random variable (X, Y) taking values within some half-circle, with distri- 
bution depending on some parameter 9, define its circular expectation E 0) g(X, Y) as 
a point in S 1 by a choice of angular coordinate as above. Show that E 0j g(X, Y) is 
well-defined. Let {Pg : < 9 < 2n} be the family of laws which are uniform on arcs 
of length 7r, given by [9 — ir/2, 9 + tv/2]. 

(c) Show that T is unbiased for circular expectation for the given family {Pg : < 
9 < 2n} in the sense that E 0j gT G S 1 is equal for any 9 to sg. 

(d) Find the risk of T for each 9 where the loss function is min^ E x(^ — 9 — 2kn) 2 and 
the minimum is over all integers k G Z. Hint: \9 — 9 — 2kix\ < tv/2 for a unique fceZ. 
Reduce to a question about real-valued uniform random variables. 

4. In the same situation as the previous problem, for n i.i.d. observations and a given 
choice of coordinates, take the order statistics 9(1) < 9(2) < ■ ■ ■ < 9(n). Let U := 
(9(l) + 9(n))/2. 

(a) Show that (s0(i)> s<9(2)) form a sufficient statistic in S 1 x S 1 . 

(b) Show that su is well-defined and is an unbiased estimator of sg in the same sense 
as T in Problem 3. 

(c) Find the risk of U for each 9 as in Problem 3 and show it is smaller than that of 
T. 

5. Let {Pg : — oo < 9 < oo} be a location family in R where Pg has a density /(#, x) := 
g(x — 9) with respect to Lebesgue measure dx for a fixed probability density g. An 
estimator T for 9 is called equivariant if for any feel, the distribution of T + h for 
Pg is the same as the distribution of T for Pg+h- Show that an equivariant estimator 
of 9 can't be Bayes for squared-error loss for any prior. Hint: an equivariant estimator 
minus a constant would be an unbiased estimator, and being Bayes implies that the 
estimator itself is unbiased. Apply Theorem 2.6.2. Then {x : g(x) > 0} has Lebesgue 
measure > and so cannot be disjoint from all its translates (RAP, Prop. 3.4.3). 

6. An estimator V = V(u\,... ,u n ) taking (S 1 ) 71 into S 1 for the unit circle 5" 1 is called 
equivariant if for any rotation R of S 1 , V(Ru\,... ,Ru n ) = R(V(ui, u n )). Show 
that both estimators T and sjj in Problems 3 and 4 are equivariant. 

7. Prove or disprove: the estimator sjj in Problem 3 is Bayes for the uniform prior on the 
circle and loss function as in Problem 3(b). 

8. Suppose the probability p of success in n independent trials has a prior which is a beta 
density p a ~ l (l — p) b ~ l / B(a, b) for < p < 1 with respect to Lebesgue measure. Here 
a > 0, b > 0, andS(a,6) = Jq x^^-xf^dx. Recall that B(a, b) = T(a)T(b)/T(a+b) 
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where F(a) := / °° x a ~ x e~ x dx for a > and F(m) = (m - 1)! for m = 1, 2, . . . . If /c 
successes are observed in the n trials, 

(a) What is the posterior distribution of pi 

(b) What is the Bayes estimator of p, for squared-error loss? 

NOTES 

Theorems 2.6.1 and 2.6.2 are stated in Lehmann (1991), Corollary 4.1.1 p. 239 and 
Theorem 4.1.2 pp. 244-245, but for the former, Lehmann writes the Bayes estimator as 
T(x) = E(g(9)\x), implicitly assuming that J \g(9)\d7r(9) < oo. Lehmann's proof of the 
latter theorem uses the assumption that g(6) € £ 2 (?r)- The Example given after Theorem 
2.6.1 shows that these assumptions need not hold. Thus, other proofs have been given for 
Theorems 2.6.1 and 2.6.2 without any moment assumptions. 

Lehmann apparently doesn't give any earlier references for these facts, although at 
least Theorem 2.6.1 for g E C 2 was presumably known well before 1983. 

Bickel and Doksum (1977), Theorem 1.6.1 and (10.3.1), state that the Bayes estimator 
of g(0), if it exists, must be E(g(9)\x). Again, this is correct only when the expectation 
and thus the conditional expectation are defined. 

Berger (1985, Sec. 4.4.2 p. 161) correctly states that the Bayes estimator for squared- 
error loss is the expectation for the posterior distribution, in the special case g(9) = 9, 
under the assumption that each of three integrals for the posterior distribution is finite (as 
they will be, almost surely, under the assumption of Theorem 2.6.1). 
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